Monotone multi-armed bandit allocations

نویسنده

  • Aleksandrs Slivkins
چکیده

We present a novel angle for multi-armed bandits (henceforth abbreviated MAB) which follows from the recent work on MAB mechanisms (Babaioff et al., 2009; Devanur and Kakade, 2009; Babaioff et al., 2010). The new problem is, essentially, about designing MAB algorithms under an additional constraint motivated by their application to MAB mechanisms. This note is self-contained, although some familiarity with MAB is assumed; we refer the reader to Cesa-Bianchi and Lugosi (2006) for more background.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance

Consider a requester who wishes to crowdsource a series of identical binary labeling tasks from a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and moreover their costs are private. The problem is to select an optimal subset of the workers to work on each task so that the outcome obtained...

متن کامل

The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection

The multiarmed bandit is often used as an analogy for the tradeoff between exploration and exploitation in search problems. The classic problem involves allocating trials to the arms of a multiarmed slot machine to maximize the expected sum of rewards. We pose a new variation of the multiarmed bandit—the Max K-Armed Bandit—in which trials must be allocated among the arms to maximize the expecte...

متن کامل

A novel ex-post truthful mechanism for multi-slot sponsored search auctions

In this paper, we advance the state-of-the-art in designing ex-post truthful multi-armed bandit (MAB) mechanisms for multi-slot sponsored search auctions (SSA) through two different contributions. First, we prove two important impossibility results which rule out the possibility of an expost monotone MAB allocation rule having sublinear regret with time when the click through rates (CTR) of the...

متن کامل

Decision Maker using Coupled Incompressible-Fluid Cylinders

The multi-armed bandit problem (MBP) is the problem of finding, as accurately and quickly as possible, the most profitable option from a set of options that gives stochastic rewards by referring to past experiences. Inspired by fluctuated movements of a rigid body in a tug-of-war game, we formulated a unique search algorithm that we call the ‘tug-of-war (TOW) dynamics’ for solving the MBP effic...

متن کامل

Online Submodular Set Cover, Ranking, and Repeated Active Learning

We propose an online prediction version of submodular set cover with connections to ranking and repeated active learning. In each round, the learning algorithm chooses a sequence of items. The algorithm then receives a monotone submodular function and suffers loss equal to the cover time of the function: the number of items needed, when items are selected in order of the chosen sequence, to ach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011